optimal iterative sketching method
Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform
Random projections or sketching are widely used in many algorithmic and learning contexts. Here we study the performance of iterative Hessian sketch for least-squares problems. By leveraging and extending recent results from random matrix theory on the limiting spectrum of matrices randomly projected with the subsampled randomized Hadamard transform, and truncated Haar matrices, we can study and compare the resulting algorithms to a level of precision that has not been possible before. Our technical contributions include a novel formula for the second moment of the inverse of projected matrices. We also find simple closed-form expressions for asymptotically optimal step-sizes and convergence rates. These show that the convergence rate for Haar and randomized Hadamard matrices are identical, and asymptotically improve upon Gaussian random projections. These techniques may be applied to other algorithms that employ randomized dimension reduction.
Review for NeurIPS paper: Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform
Additional Feedback: Please find below a list of questions and comments: - 1) Did you experiment applying the truncated Walsh-Hadamard transform (Ailon & Liberty. In Discrete & Computational Geometry, 2009.) when using SRHT? - 2) lines 199-207 Could you comment on the fact that m is constrained to be greater than d? Would it be possible to achieve better performances with smaller m? Is there any theory about this? - 3) Could you please precisely give the references claiming that m \approx d \log(d) is prescribed for state-of-the-art algorithms and for which algorithm? - 4) Below Theorem 4.1, there is an explanation on why this additional assumption \mathbb{E} \[ \Delta_0 \Delta_0 \top \] (1/d) I_d is a "mild assumption". I did not understood the provided argument. Is this assumption often/easily met? - 5) lines 222-223 and 264-265 it is mentioned that "SRHT [...] contains less randomness, but is more structured and faster to generate" than Haar matrix.